Search CORE

43 research outputs found

Fast DD-classification of functional data

Author: Mosler Karl
Mozharovskyi Pavlo
Publication venue
Publication date: 28/01/2016
Field of study

A fast nonparametric procedure for classifying functional data is introduced. It consists of a two-step transformation of the original data plus a classifier operating on a low-dimensional hypercube. The functional data are first mapped into a finite-dimensional location-slope space and then transformed by a multivariate depth function into the

DD

-plot, which is a subset of the unit hypercube. This transformation yields a new notion of depth for functional data. Three alternative depth functions are employed for this, as well as two rules for the final classification on

[0,1]^q

. The resulting classifier has to be cross-validated over a small range of parameters only, which is restricted by a Vapnik-Cervonenkis bound. The entire methodology does not involve smoothing techniques, is completely nonparametric and allows to achieve Bayes optimality under standard distributional settings. It is robust, efficiently computable, and has been implemented in an R environment. Applicability of the new approach is demonstrated by simulations as well as a benchmark study

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

Fast computation of Tukey trimmed regions and median in dimension $p>2$

Author: Liu Xiaohui
Mosler Karl
Mozharovskyi Pavlo
Publication venue
Publication date: 08/11/2018
Field of study

Given data in

\mathbb{R}^{p}

, a Tukey

\kappa

-trimmed region is the set of all points that have at least Tukey depth

\kappa

w.r.t. the data. As they are visual, affine equivariant and robust, Tukey regions are useful tools in nonparametric multivariate analysis. While these regions are easily defined and interpreted, their practical use in applications has been impeded so far by the lack of efficient computational procedures in dimension

p > 2

. We construct two novel algorithms to compute a Tukey

\kappa

-trimmed region, a na\"{i}ve one and a more sophisticated one that is much faster than known algorithms. Further, a strict bound on the number of facets of a Tukey region is derived. In a large simulation study the novel fast algorithm is compared with the na\"{i}ve one, which is slower and by construction exact, yielding in every case the same correct results. Finally, the approach is extended to an algorithm that calculates the innermost Tukey region and its barycenter, the Tukey median

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

FigShare

Depth and Depth-Based Classification with R Package ddalpha

Author: Dyckerhoff Rainer
Mozharovskyi Pavlo
Pokotylo Oleksii
Publication venue: 'Foundation for Open Access Statistic'
Publication date: 01/01/2019
Field of study

Following the seminal idea of Tukey (1975), data depth is a function that measures how close an arbitrary point of the space is located to an implicitly defined center of a data cloud. Having undergone theoretical and computational developments, it is now employed in numerous applications with classification being the most popular one. The R package ddalpha is a software directed to fuse experience of the applicant with recent achievements in the area of data depth and depth-based classification. ddalpha provides an implementation for exact and approximate computation of most reasonable and widely applied notions of data depth. These can be further used in the depth-based multivariate and functional classifiers implemented in the package, where the DDα-procedure is in the main focus. The package is expandable with user-defined custom depth methods and separators. The implemented functions for depth visualization and the built-in benchmark procedures may also serve to provide insights into the geometry of the data and the quality of pattern recognition

Kölner UniversitätsPublikationsServer

Journal of Statistical Software

HAL-Polytechnique

Tailoring Mixup to Data using Kernel Warping functions

Author: Bouniot Quentin
d'Alché-Buc Florence
Mozharovskyi Pavlo
Publication venue
Publication date: 02/11/2023
Field of study

Data augmentation is an essential building block for learning efficient deep learning models. Among all augmentation techniques proposed so far, linear interpolation of training data points, also called mixup, has found to be effective for a large panel of applications. While the majority of works have focused on selecting the right points to mix, or applying complex non-linear interpolation, we are interested in mixing similar points more frequently and strongly than less similar ones. To this end, we propose to dynamically change the underlying distribution of interpolation coefficients through warping functions, depending on the similarity between data points to combine. We define an efficient and flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves both performance and calibration of models. Code available in https://github.com/ENSTA-U2IS/torch-uncertaint

arXiv.org e-Print Archive

Choosing among notions of multivariate depth statistics

Author: Mosler Karl
Mozharovskyi Pavlo
Publication venue
Publication date: 31/03/2021
Field of study

Classical multivariate statistics measures the outlyingness of a point by its Mahalanobis distance from the mean, which is based on the mean and the covariance matrix of the data. A multivariate depth function is a function which, given a point and a distribution in d-space, measures centrality by a number between 0 and 1, while satisfying certain postulates regarding invariance, monotonicity, convexity and continuity. Accordingly, numerous notions of multivariate depth have been proposed in the literature, some of which are also robust against extremely outlying data. The departure from classical Mahalanobis distance does not come without cost. There is a trade-off between invariance, robustness and computational feasibility. In the last few years, efficient exact algorithms as well as approximate ones have been constructed and made available in R-packages. Consequently, in practical applications the choice of a depth statistic is no more restricted to one or two notions due to computational limits; rather often more notions are feasible, among which the researcher has to decide. The article debates theoretical and practical aspects of this choice, including invariance and uniqueness, robustness and computational feasibility. Complexity and speed of exact algorithms are compared. The accuracy of approximate approaches like the random Tukey depth is discussed as well as the application to large and high-dimensional data. Extensions to local and functional depths and connections to regression depth are shortly addressed

arXiv.org e-Print Archive

Kölner UniversitätsPublikationsServer

HAL Descartes